长文件摘要是自然语言处理领域的重要且艰巨的任务。良好的长文件摘要表现揭示了模型对人类语言的理解。目前,大多数研究侧重于如何修改变压器的注意机制,实现更高的胭脂分数。数据预处理和后处理的研究相对较少。在本文中,我们使用两个预处理方法和后处理方法,并分析了这些方法对各种长文件摘要模型的影响。
translated by 谷歌翻译
A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.
translated by 谷歌翻译
Improving the visual quality of the given degraded observation by correcting exposure level is a fundamental task in the computer vision community. Existing works commonly lack adaptability towards unknown scenes because of the data-driven patterns (deep networks) and limited regularization (traditional optimization), and they usually need time-consuming inference. These two points heavily limit their practicability. In this paper, we establish a Practical Exposure Corrector (PEC) that assembles the characteristics of efficiency and performance. To be concrete, we rethink the exposure correction to provide a linear solution with exposure-sensitive compensation. Around generating the compensation, we introduce an exposure adversarial function as the key engine to fully extract valuable information from the observation. By applying the defined function, we construct a segmented shrinkage iterative scheme to generate the desired compensation. Its shrinkage nature supplies powerful support for algorithmic stability and robustness. Extensive experimental evaluations fully reveal the superiority of our proposed PEC. The code is available at https://rsliu.tech/PEC.
translated by 谷歌翻译
Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM
translated by 谷歌翻译
Massively multi-task learning with large language models has recently made substantial progress on few-shot generalization. However, this is usually performed in a centralized learning fashion, ignoring the privacy sensitivity issue of (annotated) data used in multiple tasks. To mitigate this issue, we propose FewFedWeight, a few-shot federated learning framework across multiple tasks, to achieve the best of both worlds: privacy preservation and cross-task generalization. FewFedWeight trains client models in isolated devices without sharing data. It broadcasts the global model in the server to each client and produces pseudo data for clients so that knowledge from the global model can be explored to enhance few-shot learning of each client model. An energy-based algorithm is further proposed to weight pseudo samples in order to reduce the negative impact of noise from the generated pseudo data. Adaptive model weights of client models are also tuned according to their performance. We use these model weights to dynamically aggregate client models to update the global model. Experiments on 118 NLP tasks show that FewFedWeight can significantly improve the performance of client models on 61% tasks with an average performance improvement rate of 30.5% over the baseline and substantially outperform FedAvg and other decentralized learning methods.
translated by 谷歌翻译
Knowledge distillation (KD) has been widely used for model compression and knowledge transfer. Typically, a big teacher model trained on sufficient data transfers knowledge to a small student model. However, despite the success of KD, little effort has been made to study whether KD leaks the training data of the teacher model. In this paper, we experimentally reveal that KD suffers from the risk of privacy leakage. To alleviate this issue, we propose a novel knowledge distillation method, swing distillation, which can effectively protect the private information of the teacher model from flowing to the student model. In our framework, the temperature coefficient is dynamically and adaptively adjusted according to the degree of private information contained in the data, rather than a predefined constant hyperparameter. It assigns different temperatures to tokens according to the likelihood that a token in a position contains private information. In addition, we inject noise into soft targets provided to the student model, in order to avoid unshielded knowledge transfer. Experiments on multiple datasets and tasks demonstrate that the proposed swing distillation can significantly reduce (by over 80% in terms of canary exposure) the risk of privacy leakage in comparison to KD with competitive or better performance. Furthermore, swing distillation is robust against the increasing privacy budget.
translated by 谷歌翻译
The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great practical values in the real-world applications. Despite extensive previous efforts are made on few-shot learning tasks, we emphasize that most existing methods did not take into account the distributional shift caused by sample selection bias in the FSL scenario. Such a selection bias can induce spurious correlation between the semantic causal features, that are causally and semantically related to the class label, and the other non-causal features. Critically, the former ones should be invariant across changes in distributions, highly related to the classes of interest, and thus well generalizable to novel classes, while the latter ones are not stable to changes in the distribution. To resolve this problem, we propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency by replacing the patch-level information and supervision of the query images with random gallery images from different classes from the query ones. We theoretically show that such an augmentation mechanism, different from existing ones, is able to identify the causal features. To further make these features to be discriminative enough for classification, we propose Correlation-guided Reconstruction (CGR) and Hardness-Aware module for instance discrimination and easier discrimination between similar classes. Moreover, such a framework can be adapted to the unsupervised FSL scenario.
translated by 谷歌翻译
Deep learning-based physical-layer secret key generation (PKG) has been used to overcome the imperfect uplink/downlink channel reciprocity in frequency division duplexing (FDD) orthogonal frequency division multiplexing (OFDM) systems. However, existing efforts have focused on key generation for users in a specific environment where the training samples and test samples obey the same distribution, which is unrealistic for real world applications. This paper formulates the PKG problem in multiple environments as a learning-based problem by learning the knowledge such as data and models from known environments to generate keys quickly and efficiently in multiple new environments. Specifically, we propose deep transfer learning (DTL) and meta-learning-based channel feature mapping algorithms for key generation. The two algorithms use different training methods to pre-train the model in the known environments, and then quickly adapt and deploy the model to new environments. Simulation results show that compared with the methods without adaptation, the DTL and meta-learning algorithms both can improve the performance of generated keys. In addition, the complexity analysis shows that the meta-learning algorithm can achieve better performance than the DTL algorithm with less time, lower CPU and GPU resources.
translated by 谷歌翻译
链接的语音实体旨在识别和消除语言中的命名实体。常规方法严重遭受了不受限制的语音样式和ASR系统产生的嘈杂笔录。在本文中,我们提出了一种名为“知识增强命名实体识别”(KENER)的新颖方法,该方法致力于通过在实体识别阶段无痛地纳入适当的知识来改善鲁棒性,从而改善实体联系的整体性能。肯纳(Kener)首先检索未提及的句子的候选实体,然后利用实体描述作为额外的信息来帮助识别提及。当输入短或嘈杂时,由密集检索模块检索的候选实体特别有用。此外,我们研究了各种数据采样策略和设计有效的损失功能,以提高识别和歧义阶段中检索实体的质量。最后,将与过滤模块的链接作为最终保障措施应用,从而可以过滤出错误认可的提及。我们的系统在NLPCC-2022共享任务2的轨道1中获得第一名,并在轨道1中获得第一名。
translated by 谷歌翻译
时间序列分类是现实世界中的重要问题。由于其非平稳属性随着时间的推移而变化,因此建立泛化模型以表现出来的分布仍然具有挑战性。在本文中,我们建议从分布的角度查看时间序列分类问题。我们认为时间复杂性归因于其中未知的潜在分布。为此,我们建议多元化学习时间序列分类的广义表示。多元化进行了一个迭代过程:它首先通过对抗训练获得了最坏情况的分布场景,然后与获得的子域的分布匹配。我们还提供了一些理论见解。我们进行有关手势识别,语音命令识别,可穿戴压力和影响检测的实验,以及基于传感器的人类活动识别,在不同的情况下总共有七个数据集。结果表明,多样化的多样化大大优于其他基线,并通过定性和定量分析有效地表征了潜在分布。
translated by 谷歌翻译